OPEN 3 ACCESS Freely available online 



•0-PLOS I o-^E 



Genotyping of French Bacillus anthracis Strains Based on (jj^ 
31 -Loci Multi Locus VNTR Analysis: Epidemiology, Marker crossMark 
Evaluation, and Update of the Internet Genotype 
Database 

Simon Thierry\ Christophe TourtereP'^ Philippe Le Fleche^'^'^, Sylviane Derzelle\ Neira Dekhil\ 
Christiane Mendy\ Cecile Colaneri\ Gilles Vergnaud^'^'^, Nora Madani^* 

1 University Paris-Est, Anses, Animal Health Laboratory, Bacterial Zoonosis Unit, Maisons-Alfort, France, 2 Univ Paris-Sud, Institut de Genetique et Microbiologie, UMR 8621, 
Orsay, France, 3CNRS, Orsay, France, 4 Division of Analytical Microbiology, DGA CBRN Defence, Vert le Petit, France, 5DGA/MRIS- Mission pour la Recherche et 
rinnovation Scientifique, Bagneux, France 



Abstract 

Background: Bacillus anthracis is known to have low genetic variability. In spite of this lack of diversity, multiple-locus 
variable-number tandem repeat (VNTR) analysis (MLVA) and single nucleotide polymorphisms (SNPs) including the 
canonical SNPs assay (canSNP) have proved to be highly effective to differentiate strains. Five different MLVA schemes 
based on a collection of 31 VNTR loci (MLVA8, IV1LVA15, I\/1LVA20, I\/1LVA25 and IVILVASI) with increased resolving power 
have been described. 

Results: MLVA31 was applied to characterize the French National Reference Laboratory collection. The total collection of 
130 strains is resolved in 35 genotypes. The 1 19 veterinary and environmental strains collection in France were resolved into 
26 genotypes belonging to three canSNP lineages and four MLVA clonal complexes (CCs) with particular geographical 
clustering. A subset of seven loci (MLVA7) is proposed to constitute a first line assay. The loci are compatible with moderate 
resolution equipment such as agarose gel electrophoresis and show a good congruence value with MLVA31. The associated 
MLVA and SNP data was imported together with published genotyping data by taking advantage of major enhancements 
to the MLVAbank software and web site. 

Conclusions: The present report provides a wide coverage of the genetic diversity of naturally occurring B. anthracis strains 
in France as can be revealed by MLVA. The data obtained suggests that once such coverage is achieved, it becomes possible 
to devise optimized first-line MLVA assays comprising a sufficiently low number of loci to be typed either in one multiplex 
PCR or on agarose gels. Such a selection of seven loci is proposed here, and future similar investigations in additional 
countries will indicate to which extend the same selection can be used worldwide as a common minimum set. It is hoped 
that this approach will contribute to an efficient and low-cost routine surveillance of important pathogens for biosecurity 
such as S. anthracis. 
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introduction 

Bacillus anthracis is a spore forming Gram positive bacterium that 
causes antlirax, a zoonosis with a worldwide distribution. Anthrax 
is an acute infectious disease that may impact livestock, wUdlife 
and humans. The zoonosis is still endemic in many countries. 
Animals, especially ruminants, are infected by ingestion of 
soUborne spores while grazing [1]. In France, animal cases are 
regularly reported [2] . A few sporadic cases are recorded each year 
in areas where outbreaks have been reported in the past, and 
larger outbreaks occur every few years. Anthrax is a professional 
disease and humans are infected through exposure to animals or 



contaminated animal products when such material is incidentally 
ingested, inhaled or comes into contact with an open wound [3,4]. 
B. anthracis is also considered as a major potential biowarfare agent. 

The currently known B. anthracis population displays a low 
genetic variability and highly clonal evolution [5,6,7] indicative of 
a most recent common ancestor (MRCA) living less than a few 
tens of thousands years ago [8]. In the last decade, and owing to 
the development of whole genome sequencing technologies, an 
exhaustive exploration of genetic polymorphisms was achieved. 
Two classes of genetic markers are mostly used, variable number 
of tandem repeats (VNTRs) and single nucleotide polymorphisms 
(SNPs). These polymorphisms are assayed either by polymerase 
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chain reaction (PGR) targeted analysis of a list of VNTR or SNP 
loci or by whole genome sequencing [6,9,10,1 1,12,13,14]. Each of 
the two classes of genetic variations exhibit very different resolving 
potentialities and phylogenetic accuracy, related to their own 
specific mutation rate and number of possible allelic states: from 
the highly stable but with indi\ idually low resolution SNP to the 
more variable and homoplasic but with fine-scale resolution 
tandem repeat loci [15]. 

Thirty-one VNTR loci have been described in the B. anthracis 
genome (not including VNTRs with a one base-pair repeat unit, or 
Single Nucleotide Repeats (SNRs)) [14]. Several selections among 
these VNTR loci have been proposed to characterize B. anthracis 
[6,8,10,11,15,16]. In year 2000, the first designed MLVA system 
(MLVA8) targeted six chromosomal [vrrA, vrrBl, vrrB2, vrrCl, 
vrrC2, CG3) and two plasmid loci {pXOl, pX02). It was able to 
subtype a collection of over 400 B. anthracis strains into 89 
genotypes defining two major clonal lineages (A and B) [6]. Since 
then, this VNTR panel has been applied in numerous studies to 
examine the diversity of B. anthracis throughout the world 
[17,18,19,20,21,22,23,24,25]. It is a very robust and convenient 
assay, in spite of its limited discriminatory power for epidemiologic 
analysis. However it includes two loci located on the plasmids with 
two drawbacks. Firsdy the plasmids are sometimes absent in 
environmental strains as well as some collection strains [26,27] and 
secondly both loci have short (2 and 3 bp) repeat units, and 
consequently require high precision DNA fragment sizing 
equipment. 

In a search for higher discriminatory power, additional VNTR 
loci have subserjuently been identified. Le Fleche et al. [10] 
developed the Microbial Tandem Repeats database (http:// 
minisateUites.u-psud.fr) to faciUtate the identification and selection 
of tandem repeats and used it to propose a MLVA20 scheme for B. 
anthracis including the MLVA8 loci minus the two plasmid markers 
and 14 newly described polymorphic markers. The MLVA15 
scheme designed by Kcim and col. [8,15] is based upon the 
MLVA8 selection and seven additional loci (vntrl2, 16, 17, 19, 23, 
32 (alias bamsOl from [10]) and vntr35). Lista et al. [1 1] developed 
a 25 VNTR markers scheme (MLVA25), including MLVA8, plus 
13 of the 14 additional loci proposed by Le Fleche et al. [10] 
(bamsOl, 03, 05, 13, 15, 21, 22, 23, 24, 25, 28, 30 and 31; not 
included: bams07), and 4 new ones (bams34, 44, 51 and 53). The 
full set of 31 VNTR markers (MLVA31) present in die MLVA8, 
MLVA 15 or MLVA25 panels, has been recendy applied in 
concert with the typing of the set of SNPs called canSNPs [8] to 
evaluate the genotypic diversity found in Namibia [16]. MLVA 
provides each genotype in the form of a numeric code that can be 
easily stored in a database for comparison, as illustrated by the 
publicly accessible online database (MLVAbank at http://mlva.u- 
psud.fr/) [28,29] compiling the numerous genotypes identified in 
difierent studies [6,11,16,17,19,20,30,31,32]. 

Taking advantage of the knowledge and data acc:umulated in 
the past thirteen years, the main aims of the present study are (1) to 
provide an update of the genetic diversity of B. anthracis strains 
naturally present in France, using the most recendy developed 
MLVA3 1 assay, (2) to propose a selection of VNTR loci which 
could represent a reasonable first-line assay for B. anthracis MLVA 
genotyping (3) to present the new B. anthracis genotyping database 
made by taking advantage of major developments to the 
underlying MLVAbank software. The first-line assay should 
contain a minimum number of loci, compatible with typing using 
a large variety of DNA sizing techniques including basic agarose 
gel electrophoresis. In the present study, we have used the 
MLVA31 scheme in combination with canSNP analysis [7,8] on 
the complete ANSES collection of strains collected from different 



regions of France. The discriminatory power of the MLVA31 
scheme and the contribution of individual VNTR markers were 
evaluated to optimize the number of markers required to 
accurately resolve the genetic diversity found across French 
strains. Congruence and linkage disequilibrium analysis were 
conducted for different panels of VNTR marker [33] and the 
different MLVA schemes were compared. Because databases are a 
key issue in terms of genotyping, we have significantiy improved 
the software behind the http://mlva.u-psud.fr prototype by 
introducing three major functionalities: firstly, the new version is 
able to merge a number of independently curated databases so 
that they can be queried and browsed as a single entity, secondly, 
the database can host any kind of numeric genotyping data, such 
as canSNPs, and thirdly, a tool has been included to automatically 
deduce a MLVA or SNP genotype from genome sequence data. 

Materials and Methods 

Ethics statement 

The strains used in this study were isolated by the French 
National Reference Laboratory (NRL) for animal anthrax. The 
NRL is a public laboratory, mandated by the Ministry of 
Agriculture to confirm diagnosis of all animal anthrax cases in 
France. During an outbreak, samples are taken by the veterinary 
services of the Ministry of Agriculture. Specific permission for soil 
sampling was not required. No human strains were included in this 
study. 

Bacterial strains 

A total of 1 30 B. anthracis strains, including eleven strains from 
external collections (ten strains acquired from the "Collection de 
ITnstitut Pasteur" (CIP, Paris, France) (17JB, CIP 53.169, CIP 
66.17, CIP 74.12, CIP 77.2, CIP 81.89, CIP A204, CIP A205, 
CIP A206, CIP A211), one strain of uncertain origin from the 
'Tnstitut d'Elevage et de Medecine Veterinaire des pays 
Tropicaux" (lEMVT 89-1620)) and 119 strains collected in 
France between 1982 and 2010, were included in this study. One 
hundred and six of the 1 1 9 strains are from animal origin: bovine 
(n = 101), ovine (n = 2), equine (n - 2) and caprine (n = 1). The last 
thirteen were collected from the environment. 

VNTR typing 

The previously described MLVA31 scheme [16] was used. This 
collection of 31 loci represents all VNTR loci in the MLVA8, 
MLVA15 and MLVA25 schemes [6,11,15] (Table 1). The PCR 
reactions for the MLVA25 loci were performed in four multiplex 
reactions A-D using the PTC200 thermocycler (Bio-Rad, Marnes- 
la-Coquette, France)) as described by Lista et al. [1 1]. Multiplex A 
amplifies vrrB2, vrrCl, CG3, bamsOl (alias vntr32), bams03, 
bams05, bams 15, bams44. Multiplex B amplifies vrrBl, vrrC2, 
bams 13, bams28, bams31, bams53. Multiplex C amplifies vrrA, 
bams21, bams24, bams25, bams34. Multiplex D amplifies pXOl, 
pX02, bams22, bams23, bams30, bams51. The fluorescent 
primers were made using WeURED D2, D3 and D4 (Sigma- 
Aldrich, Saint-Quentin Fallavier, France). The six additional loci 
(vntrl2 (labeled with D4), vntrl6 (D2), vntrl7 (D3), vntrlO (D2), 
vntr23 (D3), vntr35 (D3)) were multiplexed in a single PCR 
subsequently called E. The PCR products were diluted 1:10 and 
2 |J.l of the dilution were added to a mix containing 30 |J,1 of 
Sample Loading Solution (SLS, Beckman Coulter, FuUerton, CA., 
USA) and 0.25 |xl of size marker. The samples were separated by 
electrophoresis in CEQ, Separation Gel LPA I on a CEQ, 8000 
automatic DNA Analysis System (Beckman Coulter, FuUerton, 
GA., USA) under the foUowing conditions: denaturation at 90°G 
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for 120 sec, injection at 2.0 kV for 30 sec, separation at 4.8 kV for 
60 min (multiplexes A, C, E) or 6 kV for 70 min (multiplexes B 
and D). Allele sizes were estimated using the CEQ^ Fragment 
Analysis System software, by comparing the amplicon to the size 
marker "DNA Size standard kit-600" (multiplexes A, C, E; 
Beckman Coulter, FuUerton, CA., USA) or a custom-made "CST 
Dl 100-1150 bp" (multiplexes B, D; Bioventures, Murfreesboro, 
TN, USA). 

VNTR allele coding convention 

We have used the numeric coding convention for the 8 loci 
constituting MLVA8 proposed by Keim et al. [6] except for vrrC 1 . 
vrrC 1 was initially considered as a 36 bp repeat unit, but it varies 

Table 1. List of 32 published B. anthracis VNTR markers. 



in a more complex way. Lista et al. proposed to consider vrrC 1 as 
a 9 bp repeat unit VNTR, and assigned a 53 U allele code to the 
Ames ancestor genome accession number NC_007530.2. We have 
used the Lista et al. proposal [11] for the additional 17 loci 
constituting MLVA25 and Antwerpen et al. [26] for the additional 
6 loci constituting MLVA3 1 (including MLVA 1 5 by Van Ert et al. 
[8]; see Data SI for more details on the coding conventions and 
comparison with alternative published coding conventions). 
Datasets using different conventions can be adjusted as far as 
the coding convention applied in silico to a sequenced genome is 
indicated. The convention as applied in silico to B. anthracis strain 
'Ames Ancestor' accession numbers NC_007530.2 (chromosome). 





Locus 


Forward primers (5 -3 ) 


Reverse primers (5 -3 ) 


Comment 


vrrA 1 7hn ^14hn 411^ 


rArAArTArrArrf;ATf;f^rArA 


Grf;rGTTTrGTTTGATTrATAr 




vrrRI Qhn :?:?Qhn 7011^ 


ATAf^f^Tf^f^ 1 1 1 irrfirAAfiTTATTr 

r\ \ r\\3\3 \ VJvj 1 1 1 1 v_V_vj\_rArAVj 1 1 r\ 1 1 V_ 


GATGAGTTTGATAAAGAATAGrrTGTG 




vrrR7 Qhn 1 S^hn 1^11^ 


rArAf^f^rTATTrTTTATrAAArrrATr 


rrrAAf^GTf^AAGATTGTTGTTGA 




wrrfl Qhn SROhn S^ll^ 


fiAAr;rAAriAAAr;TriAT(^TAriTrir;Ar 


rATTTmrAAriTGrTArAGGTTr 




MrrO IRhn S^9hn 1711^ 


rrAriAAr;AA(^Tr;(^AArrTriTAr;rAr 


GTrTTTrrATTAATrGrGrTrTAxr 




CG3 5bp 158bp 2U^ 


TGTCGTTTT ACTTCTCTCTC C AATAC 


AGTCATTGTTCTGTATAAAGGGCAT 




nyni-;^;^t ^hn 1 'Jfihn 711^ 


rAATTTATTAArnATrAf^ATTAAGTTrA 


TrTAGAATTAGTTGrTTrATAATGnr 

1 V_ 1 r\\3r\r\ \ \ r\\3 \ \ \3\- \ \ V_r\ 1 r\r\ \ VJVJV- 




nyn?-3t 7hn 141hn Qll^ 


TrATrrTrTTTTAAGTrTTfiriGT 


GTGTGATGAArxrrGArGArA 




ham<;m 51hn 4RShn Ifil)'^ 


(^TTGAGrATGAGAGGTArrTTGTrf II 1 1 1 


AGTTrAAGrGrrAGAAGGTTATGAGTTATr 


Ali;^^ wntr3'?'" 


h;^m<;n^ IShn S4Qhn l'^ 


GrAGfAAfAriAAAArTTrTrTrrAATAArA 

\JVwrtVJv_/A/Av_/A\Jrtr\r\rAV_ 1 1 v_ 1 \_ 1 ^V_rArA 1 r\r\\-r\ 


TrrTrrrTGAGAArTGrTAxrArrTTTAAr 




hamtins ^Qhn ^n7hn Sll^ 


GrAGGAAGAArAAAAGAAArTAGAAGAGrA 


ATTATTAGrAGGGGmrTrrTGrATTArr 




h3m<;n7 IRhn ini7hn Sflll'^* 


fiAATATTrriTGrrArrTAArAAAArAGAAA 


TGTrAGATrTAGTTGGrrrTAri 1 1 irrxr 


bclE^ 


bamsl3_9bp_814bp_70U'' 


AATTGAGAAATTGCTGTACCAAACT 


CTAGTGCATTTGACCCTAATCTTGT 


bcIA'' 


bamsl 5_9bp_41 8bp_24U'' 


GTATTTCCCCCAGATACAGTAATCC 


GTGTACATGTTGATTCATGCTGTTT 


bcID'^ 


bams21 _45bp_676bp_l OU'' 


TGTAGTGCCAGATTTGTCTTCTGTA 


CAAATTTTGAGATGGGAGTTTTACT 




bams22_36bp_735bp_l 6U'° 


ATCAAAAATTCTTGGCAGACTGA 


ACCGTTAATTCACGTTTAGCAGA 




bams23_42bp_651 bp_l 1 U"" 


CGGTCTGTCTCTATTATTCAGTGGT 


CCTGTTGCTCCTAGTGATTTCTTAC 




bams24_42bp_595bp_l 1 U'' 


CTTCTACTTCCGTACTTGAAATTGG 


CGTCACGTACCATTTAATGTTGTTA 




bams25_l 5bp_391 bp_l 3U'' 


CCGAATACGTAAGAAATAAATCCAC 


TGAAAGATCTTGAAAAACAAGCATT 




bams28_24bp_493bp_14U'' 


CTCTGTTGTAACAAAATTTCCGTCT 


TATTAAACCAGGCGTTACTTACAGC 




bams30_9bp_727bp_57U'' 


GCATAATCACCTACAACACCTGGTA 


CAGAAAATATTGGACCTACCTTCC 


bclB"^ 


bams31_9bp_772bp_64U'' 


GCTGTATTTATCGAGCTTCAAAATCT 


GGAGTACTGTTTGTTGAATGTTGTTT 


bcIC 


bams34_39bp_503bp_l 1 U'' 


CAGCAAAATCAATCGAATCAAA 


TGTGCTAAATCATCTTGCTTGG 




bams44_39bp_41 7bp_8U'' 


GCGAATTAATTGCTCCTCAAAT 


GCACTTGAATATTTGGCGGTAT 




bams51_45bp_493bp_9U'' 


ATTTCCTGAAGCAGGTTGTGTT 


TGCATCTAACAATGCAGAACAA 




bams53_l 2bp_236bp_8U'' 


GAGGTGTGTTAGGTGGGCTTAC 


CATATTTTCACCTTAATTTTGGAAG 




vntrl2_2bp_115bp_6U' 


CGTACGAAGTAGAAGTCATTAA 


GCATATAATTGCACCTCATCTAG 




vntrl 6_8bp_273bp_20U' 


CTCTTGAAAATATAAAACGCA 


GAATAATAAGGGTTCTCATGGTAT 


pX02 plasmid 


vntr17_8bp_386bp_4U' 


TAGGTAAACAAATTTTCGTAATC 


GATCGTACAACAGCAATTATCAT 


pX02 plasmid 


vntrl 9_3bp_96bp_4U' 


GTGATGAAATCGGACAAGTTAGGAG 


GAAATATTTTATTAAACATGCTTTCCATCC 




vntr23_1 2bp_1 97bp_4U' 


TTTAGAAACGTTATCACGCTTA 


GTAATACGTATGGTTCATTCCC 




vntr35_6bp_115bp_5U' 


AAATAATATGTTCCTTTTGCTG 


GTCCTGAAATAAATGCTGAAT 





'Keim et al. 2000 [6]; 
''Le Fleche et al. 2001 [10]; 
=Keim et al. 2004 [151; 
■"Lista et al. 2006 [11]; 
'Leski et al. 2009 [43]. 

*Locus not currently used due to some large amplicon sizes [10]. The numerical allele coding convention is as published by Keim et al., [6], Lista et al. [11] and 
Antwerpen et al. [26], see Data SI for more details on the coding conventions and comparison with alternative coding conventions. 
doi:l 0.1 371/journal.pone.0095131 .tool 
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NC_007322.2 (pXOl plasmid) and NC_007323.2 (pX02 plas- 
mid) is recalled in Table 1 and Data SI and S2. 

Discriminatory power 

The discriminatory power was calculated by using Simpson's 
diversity index (DI) [34] . A Dl-value of 1 indicates that the typing 
method is able to discriminate between aU isolates. A Dl-value of 0 
indicates that all isolates are identical. The discriminatory power 
of individual loci was calculated on a dataset comprising only one 
strain per MLVA31 genotype. The discriminatory power of the 
different MLVA panels was calculated on the full dataset. 

Data analysis 

MLVA clustering was performed using the BioNumerics 
software package version 6.6 (Applied-Maths, Sint-Martens- 
Latem, Belgium). Data were analysed as a character dataset using 
the categorical distance coefficient. Clustering was achieved either 
with UPGMA (Unweighted Pair Group Method with Arithmetic 
mean) or Minimum Spanning Tree (MST). The priority rule for 
constructing MST was set so that the type that had the highest 
number of single-locus variants (SLVs) would be linked first. A 
cutoff value of 85% of similarity was apphed to define clonal 
complexes (CC). 

Congruence between different experiments was measured by 
BioNumerics using the categorical distance coefficient. Linkage 
disequilibrium was measured by using LL\N version 3.5 software 
(http://guanine.evolbio.mpg.de/) as described by Haubold & 
Hudson [35] . The linkage disequilibrium analysis for each MLVA 
panel was conducted on a dataset comprising one strain per 
MLVA genotype. The Monte-Carlo simulation was run with 
10000 iterations. 

The contribution of each marker to the global standardized 
index of association (Ia^) was calculated as the difference between 
the MLVA31 and the MLVA30 Ia^ without this marker. 

Suggestions for optimized panels were determined using the 
Automated Selection of Typing Target Subsets (AuSeTTS) 
Analysis [36]. 

Online database 

MLVA31 data generated in this study can be accessed in the 
"Bacillus anthracis" database at MLVAbank http: / / mlva.u-psud.fr/ . 
MLVAbank is a demo project first described by Le Fleche et al., 
2002 [37] and the previous major modifications were described by 
Grissa et al., 2008 [29]. The 2008 MLVAbank version aUowed the 
making by registered users of private or public MLVA databases 
(registration is free). Unregistered users can query public databases. 
For the present project, three major functionalities have been 
added. Firstly MLVAbank can now store any kind of genotyping 
data, including multiple locus sequence typing (MLST), SNPs, and 
Clustered Regularly Interspaced Short Palindromic Repeats 
(CRISPR) derived data [38]. Secondly, different databases can be 
declared as being part of a cooperative database, after a number of 
conventions have been agreed upon, regarding field names, 
genotyping markers names, allele coding conventions. Thirdly, an 
in silico typing tool has been included, which can deduce MLVA and 
SNP genotypes from whole genome sequence data. The application 
is coded using Javascript. The input is PCR primers information (for 
MLVA typing) or SNP positions. A tutorial is available via the 
MLVAbank web page, and an overview is presented as File S3. 

MLVAbank provides minimum dendrogram tracing tools, 
based on the categorical distance coefficient and UPGMA 
method. In addition MEGA format distance matrix or Newick 
format tree text files can be exported for analysis with the MEGA 
software [39] and FigTree [http://tree.bio.ed.ac.uk] respectively. 



Results 

MLVASI on 130 strains of B. anthracis 

The discriminatory power of MLVA31 in this collection is 
0.8874 (confidence interval 0.8491-0.9257) as compared to 0.7259 
(0.6672-0.7846) for MLVA8, 0.8370 (0.8065-0.8675) for 
MLVA15 and 0.8831 (0.8436-0.9226) for MLVA25. MLVA31 
resolves 35 genotypes (Data S2), fourteen contain more than one 
strain. The eleven external collection strains contribute nine 
genotypes whereas the 1 1 9 strains collected in France during the 
last three decades fall into 26 genotypes distributed in four clusters 
(minimum spanning tree. Figure la; the geographic origin of the 
strains is shown in Figure lb). The geographical repartition of the 
different genotypes and clonal complexes (CCs) is presented on the 
map of France (Figure 2 and Data S4) which illustrates the 
congruence between geographic origin and genetic clustering. 

The largest cluster is CCl and corresponds to MLVA8 GT79 
[6,17]. Seventy-five percent of the strains belonging to CCl (44 
out of 59) originate from three departments (Savoie and Haute- 
Savoie in the Alpes; Cantal in Massif Central). Strains from nearby 
Saone-et-Loire (n = 6), Puy-de-Dome (n = 3), Isere (n = 3), Aveyron 
(n = 4) and AUier (n = 1) also cluster within CCl. Six genotypes 
compose this major cluster. AU strains but one from Pyrenees- 
Atlantiques (ten out of eleven) constitute CC2 which also 
corresponds to MLVA8 GT80. CC3 includes all strains from 
Doubs (n = 21). CC3 and the A.Br.001/002 subgroup appear as 



CC4 1CT3, 6, la, lb, 20a) 




I Aveyron (n=6) 
I Cantal (n=8) 

C6te-dOr(n=10) 

Doubs (n=21) 
I Pyrenees-Atlantiques (n=11 ) 
I Savoie (n=31) 

Saone-et-Loire (n=6) 
I Haute-Savoie (n=3) 

Other regions (n=23) 



Figure 1 . (a) Minimum spanning tree of MLVA31 data from 119 
animal and environmental B. anthracis strains. Each circle 
represents a unique genotype. The diameter of each circle corresponds 
to the number of isolates with the same genotype. Genotypes 
connected by a shaded background differ by a maximum of 3 of the 
31 VNTR markers and could be considered as a "clonal complex". Thick 
connecting lines represent one locus differences; regular connecting 
lines represent two loci differences. The length of each branch is 
proportional to the number of differences. Each epidemiological 
situation is represented by a specific color as defined in part b.(b). 
Localization of the 119 animal and environmental 6. anthracis strains. 
doi:10.1 371/journal.pone.00951 31 .gOOl 
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^ ="30 samples 

• =20 samples 

• =10 samples 

• = 5 samples 
- = 1 sample 



Figure 2. Geographical repartition of tlie strains for eacKi clonal complex (CC). ^CC1 strains. \C2 (black circle), and CC3 (grey circle) strains. 
■^004 strains. 

doi:10.1371/journal.pone.0095131.g002 



minor groups that are geographically restricted to the east of 
France. Field specimens clustered into CCS were aU isolated from 
a recent episode associated with the death of 39 animals in the 
Doubs departement. 

CC4 is the most diverse cluster and corresponds to MLVA8 
GTS, 6, la, lb and 20a [6,17]. It accounts for 17 genotypes, 
comprising 27 strains including all ten strains from Cote-d'Or, two 
strains from Aveyron, one of the eleven strains from Pyrenees- 
Adantiques and 15 strains from various regions. Four external 
collection strains appear to be related to CC4 (CIP A2()4, CIP 
A205, CIP A206 and 17JB) whereas the seven others are 
genetically distinct from the four CCs (data not shown). The 
geographic localization of strains of each CC is illustrated in 
Figure 2. Fifty-nine- percent (71 of the 119 strains) occurred in four 
departements (Savoie, Pyrenees-Atlantiques, Doubs, Cantal). The 
CC 1 is primarily found in south-eastern France (Figure 2a) while 
CC2 and CCS are restricted to a single arc-a (Figure 2b). 
Genotypes clustered within CC4 are observed throughout the 
country (Figure 2c), and are associated with sporadic outbreaks. 

IV1LVA31 and canonical SNP analysis comparison 

Most strains (128 among 130 strains) belong to three canSNP 
subgroups, A.Br.001/002, A.Br.008/009, and B.Br.CNEVA. The 
last two are external collections strains belonging respectively to 
lineages A.Br.005/006 and A.Br.VoUum not naturally present in 
France [40]. Strains clustered within CCl and CC2 are affiliated 
with the B.Br.CNEVA sub-lineage, while strains clu.stered within 
CCS and CC4 are part of the A.Br.001/002 and A.Br.008/009 
sub-groups, respectively (Figure la). 

B.Br.CNEVA is the main sub-lineage found in France, with a 
majority of strains affiliated with this lineage. BamsOl, vrrBl, 
bams 15 and bams2S appeared as defining diagnostic markers for 
the sub-lineage present in France, with unique number of repeat 
copies (14, 19, 42 and 10, respectively). Three other VNTR loci 
(bams22, bams34 or bams51) highlighted regional clustering 
patterns, suggesting a successful establishment and spatial differ- 
entiation of B.Br.CNEVA strains in France. For instance, strains 
from the Pyrenees (CC2) were characterized by 9 repeat units 
alleles for marker bamsS4 and 8 repeat units alleles for marker 
bamsSl, whereas strains from the Alps (CCl) had 13 repeat units 
for marker bams22. 

The most diverse CC4 represents the second group of strains 
and two-thirds of the genotypes. French strains affiliated with CC4 
and the TransEurasian group (A.Br.008/009) originated from 
episodic outbreaks occurring throughout the country, with a 
particular spot in the Cotes-d'Or departement. AU CC4 genotypes 
share the same CG3 allele previously described for most strains 
belonging to the A.Br.008/009 and A.Br.WNA canSNP types 
[15]. The five-nucleotide sequence of CGS is present in only one 



copy. The most discriminative VNTR loci within CC4 were 
bamsSO (six different alleles), followed by bams 15, and bams05 
(four different alleles). Here again, the exosporium coding VNTRs 
appear to be among the most variable genetic elements within B. 
anthracis. 

Individual VNTR markers evaluation 

The diversity index of each VNTR marker and contribution to 
the standardized index of association (I^'') was calculated on a 
dataset using one strain per MLVA31 genotype (Data S2 and 
Table 2). Three panels of markers can be distinguished (Figure 3). 
Panel A contains the 18 loci with positive Ia^ contribution. 
Diversity indexes of these loci vary from 0.30 to 0.77. Panel B 
includes six markers (vrrA, bams05, bams 13, bams 15, bamsSO, 
bamsS 1 ) with a similar diversity range but a negative contribution 
to Ia . Panel C is composed of seven markers showing negative 
contribution to l.x' and low diversity, comprising three mono- 
morphic markers (bams21, bams25 and bams28) in addition to 
vrrB2, bams24, bams44 and vntrl9. 

MLVA7 scheme 

Marker evaluation allowed to select an optimised panel of seven 
VNTR markers among those most suitable for agarose gel typing, 
and the intermediate sizing resolution capillary electrophoresis 
such as Qiaxcel [41]. Sixteen loci have a repeat unit size (more 
than 1 0 bp) and observed allele size range compatible with agarose 
gel typing. The newly defined MLVA7 scheme includes the 
following agarose-friendly markers: vrrA, bamsOS, bams05, 
bams22, bams34, bams44, and vntr2S (Figure 4). None of these 
loci are located on the plasmids. Two are common to the 
MLVA15 assay. The MLVA7 panel resolves as many genotypes as 
the sixteen agarose-friendly loci, 1 4 genotypes in the 1 1 9 strains 
from France and 19 genotypes in the full collection of 130 strains 
(diversity index, 0.8584 confidence interval 0.82S9-0.8929). 

Comparison of MLVA schemes (MLVA7, MLVA8, MLVA1 5, 
MLVA25 and MLVA31) on the "France" and "Namibia" 
MLVA datasets 

Beyer et al. have published an extensive description of B. 
anthracis genetic diversity in Namibia as assayed by MLVASl 
analysis [16]. The global diversity index of the assay calculated on 
the full dataset is recalled in Table 2, together with individual loci 
diversity calculated on a dataset comprising one strain per 
genotype. The situation in France and Namibia is similar with 
few canSNP lineages represented. Figure 5 illustrates the behavior 
of the different VNTR loci in both countries. Whereas some loci 
are similarly variable in both datasets (e.g. pXOl, pX02, bamsOS, 
bams 13 etc), others behave strikingly differentiy. Usually the 
diversity of this second group of VNTRs is higher in the French 
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Figure 3. Graphic comparison of the discriminatory power and the contribution to the standardized index of association (U^) for 
each VNTR mariner. 

doi:10.1371/journal.pone.0095131.g003 



dataset, but this however simply reflects the presence of both 
cluster A and cluster B strains in the French dataset (Data S2). 

The diversity index DI of each MLVA scheme calculated on 
both collections of strains and the standardized index of linkage 
association calculated keeping one strain per genotype are given in 
Table 2. The non-null standardized index of linkage association is 
in agreement with the clonal structure of the B. anthracis 
population. 

Direct comparison of the different MLVA schemes showed a 
strong correlation between the clustering methods, as illustrated by 
the congruence analysis in Figure 6a for the "France" dataset. The 
congruence between MLVA31 and the MLVA25 scheme was 
99.7% for the set of 130 strains typed. The congruence between 
MLVA31 and MLVA15 was slighdy lower (96.1%) and much 
better than that determined with MLVA8 (85.5%). Interestingly 
MLVA7 demonstrated a discriminatory power (0.8584), Ia^ value 
(0.2323) and congruence with MLVA31 (96.5%) higher than both 
MLVA8 and MLVA15. 

The corresponding MLVA panels DI values deduced from the 
"Namibia" MLVA31 dataset [16] are indicated in Table 2. The 
global MLVA31 is significandy lower in the Namibia dataset as 
compared to the French dataset, but this might simply reflect a 
more extensive sampling in a geographically more focused area. 
The MLVA8 and MLVA 15 DIs are almost as high, and this is 
largely due to the pXO 1 and pX02 loci. In comparison, MLVA7 
has a much lower discriminatory power. Sixteen genotypes are 
resolved by MLVA7 (in comparison to 24 obtained with MLVA8 
and MLVA15, and 38 with MLVA31). The maximum of 18 
genotypes achievable with the 16 agarose-friendly loci would be 
reached by substituting bams21 to bams34, bams53 or vntr23. 
The congruence of MLVA7 with MLVA31 (95.8%) remains 
better as compared to MLVA8 or even MLVA 15 (Figure 6b). 



Making of the IVlLVAbank Bacillus anthracis database 

The B. anthracis database at http://mlva.u-psud.fr was updated 
taking advantage of the new version of the MLVAbank software, 
which allows the making of cooperative databases, the integration 
of SNP data, and the in silico analysis of whole genome sequence 
(Data S3). The current version includes a "iJ_an/Amcw_in_silico", 
"B_anthracis_20\2^', "B_anthracis_20l3" and "B_anthracis_20l'i" 
components. B_anthracis_m_siico was deduced from the in silico 
MLVA and SNP typing of the seven fully sequenced strains 
released to date. It also includes Bacillus cereus and Bacillus 
thuringiensis genomes showing an average sequence similarity with 
B. anthracis above 97%. B_anthracis_20l2 contains MLVA data 
recovered from articles on B. anthracis genotyping published up to 
year 2012, together with the associated SNP data. B_anthra~ 
cis_2013 contains data published in year 2013. B_atithracis_2i)\'^ 
presents the results available in Data S2. 

Before inclusion, published data has been checked and 
normalized when necessary so that the same allele calling 
convention is used. For instance. Van Ert et al. [8] used a 
lettering convention for MLVA15, and the numbering convention 
proposed by Keim et al. 2000 [6] and Antwerpen et al. 20 1 1 [26] 
was applied. Both Lista et al. 2006 [1 1] and Beyer et al. 2012 [16] 
used for some loci a convention that differed partly from 
previously pubhshed conventions, but the published data included 
in silico typing data from the Ames reference strain, allowing the 
recoding of the data set (see Data SI for additional consideration 
on the MLVA31 published allele coding conventions). 

Discussion 

Over the last decade, significant research efforts have been 
undertaken to develop appropriate genotyping methods for B. 
anthracis strain differentiation. The currently available methods 
take advantage of tandem repeat polymorphisms or point 
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Table 2. Discriminatory power and linkage disequilibrium analysis results for 5 sets of VNTR markers: the newly proposed MLVA7 
and MLVA8, MLVA15, MLVA25, MLVA31. 







MLVA7 


MLVA8 


MLVA15 


MLVA25 


MLVA31 


MLVA31* 


Number of loci 


7 


8 


15 


25 


31 


31 


Standardized index of 
association {1^^) French 
dataset 


0.2323 


0.1371 


0.2300 


0.3889 


0.4166 


ND 


Diversity Index (D/) 
(genotypes) 


0.8551 (19) 


0.7259 (14} 


0.8370 (17) 


0.8831 (34) 


0.8874 (35) 




Dl, Namibia dataset* 


0.3111 (16) 


0.7214 (23) 


0.7214 (24) 


0.7450 (38) 


- 


0.7450 (38) 


vrrA 


X 


X 


X 


X 


0.5895 


0.1522 


vrrBl 




X 


X 


X 


0.5091 


0.2347 


vrrB2 




X 


X 


X 


0.0306 


0.0526 


vrrCI 




X 


X 


X 


0.4078 


0.1522 


vrrC2 




X 


X 


X 


0.4036 


0.1024 


CG3 




X 


X 


X 


0.3893 


0.1024 


pXOl 




X 


X 


X 


0.6167 


0.7112 


pX02 




X 


X 


X 


0.5478 


0.8051 


bamsOl 






X 


X 


0.6093 


0.4950 


bams03 


X 






X 


0.6382 


0.2006 


bamsOS 


X 






X 


0.5095 


0.4950 


bams13 








X 


0.6130 


0.6828 


bamsIS 








X 


0.5883 


0.5021 


bams21 








X 


0.0000 


0.1494 


bams22 


X 






X 


0.6503 


0.3713 


bams23 








X 


0.5020 


0.5505 


bams24 








X 


0.0154 


0.0000 


bams25 








X 


0.0000 


0.0000 


bams28 








X 


0.0000 


0.0000 


bams30 








X 


0.6739 


0.4950 


bams31 








X 


0.6809 


0.6415 


bams34 


X 






X 


0.6549 


0.4723 


bams44 


X 






X 


0.0601 


0.5576 


bamsSI 








X 


0.5723 


0.0000 


bams53 








X 


0.5020 


0.0000 


vntr12 






X 




0.5020 


0.0526 


vntr16 






X 




07710 


0.1522 


vntr17 






X 




0.5200 


0.2802 


vntr19 






X 




0.3131 


0.1935 


vntr23 


X 




X 




0.4213 


0.2404 


vntr35 






X 




0.5038 


0.1024 



•Values as calculated from MLVA31 data published by [16]. 
doi:l 0.1 371/journal.pone.0095131 .t002 



mutations. A typing strategy relying on a combination of genetic 
markers that are progressively less stable but have increasing 
resolving power (SNP, VNTRs, including SNRs) has been 
recommended [15]. In this system canSNPs typing is used to 
estabhsh phylogenetic groups, which is followed by genotyping 
with MLVA. Once a representative set of strains will have been 
typed with both methods, MLVA data will be sufficient to robustly 
infer canSNP assignment. SNPs are evolutionary stable DNA 
signatures with low mutation rate (10~ changes per nucleotide 
per generation) and two allelic states. In addition canSNPs typing 
can be applied to very low amounts of DNA, and/ or degraded 



DNA, which can be essential in a forensic context [42]. VNTR 
loci are genomic regions with higher mutational rate (ranging from 
< 1 0 to > 1 0 insertion-deletion mutations per generation) and 
a higher number of possible allelic states (12 for the bams30 
marker in this study) [15], 

Different technologies can be used to assay these polymor- 
phisms, including whole genome sequencing [6,9,10,11,12,13,14]. 
In the coming years, genotyping will increasingly be achieved via 
whole genome sequencing owing to the advent of massively 
parallel sequencing technologies. At present, the most affordable of 
these technologies are not yet able to confidently reconstruct 
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vrrA bams03 bams05 bams22 bams34 bams44 vntr23 

12bp 15bp 39bp 35bp 39bp 39bp 12bp 



Figure 4. Electrophoresis gel of the IV1LVA7 panel on four strains. Well 1: Sterne strain, wells 2 to 4: French bovine strains. Migration on 3% 
standard agarose gel at 110V during 4 hours. A 1 00 bp ladder was used running from 1 00 bp up to 1 000 bp, the 500 bp and 1 000 bp bands are 
more intense. 

doi:1 0.1 371 /journal.pone.00951 31 .g004 



tandem repeat arrays, at least the larger ones but this is expected 
to change soon. As most laboratories dealing with B. anthracis will 
only encounter a few strains per year, systematic whole genome 
sequencing will then turn out to be the most cost-efficient way to 
characterize these strains. One could even imagine that whole 
genome sequence information can then be used to decide if a 
strain is worth being kept in collection, or can be destroyed to 
avoid maintenance cost associated with this kind of highly 
dangerous pathogen. 

However, there are a number of situations in which massively 
parallel sequencing will not remove the need for alternative, light- 
weight, fast and low cost genotyping tools. The first one is routine 
vahdation and control of strain identity for strain collection 
maintenance purposes. It may be necessary to have the capacity to 
genotype many colonies within the appropriate biosafety level 
laboratory, on a crude thermolysate. The second one is forensic 
microbiology, where traces of potentially degraded DNA must be 
analysed. In both contexts, targeted characterization of previously 
characterized polymorphic loci (VNTRs or SNPs) by PGR 
amplification might remain an essential tool. 

In the present work, the diversity of French B. anthracis was 
analysed in detail with MLVA31. The 130 strains were resolved 



into 35 different genotypes and four clonal complexes. This is very 
similar to the situation reported previously in Namibia [16]. The 
discrimination power achieved by MLVA3 1 was clearly improved 
compared to the MLVA8 assay. Interestingly the four VNTRs 
coding for key components of the B. anthracis exosporium (bams 13, 
bams 15, bams30 and bams31 [43]) are among the most diverse 
loci in both MLVA31 datasets currently available (France and 
Namibia). 

We tried next to optimize at least for the present collection of 
strains the number of VNTR loci required to genotype with useful 
resolution and accuracy the French diversity of strains and 
designed a shortened MLVA scheme. Based on the individual 
marker evaluation performed in this study, the more informative 
VNTR markers suitable for agarose gel-based analysis could be 
identified. This panel contains seven loci and can be typed either 
by monoplex PGR and agarose gel electrophoresis or by 
multiplexing in a single PGR followed by capillary gel electro- 
phoresis as previously illustrated for B. anthracis and other species 
[11,44,45]. 

The MLVA7 assay is similarly efficient in both datasets, an 
excellent correlation between MLVA7 and MLVA31 results was 
observed. Despite its lower discriminatory power compared to 
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Figure 5. Comparison of the diversity index (Dl) of individual loci in the "France" versus "Namibia" datasets. 

doi:10.1371/journal.pone.0095131.g005 
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a France 



b Namibia 




Figure 6. Congruence analysis of 5 sets of VNTR mariners: 
IV1LVA7, MLVA8, IVILVA15, iVILVA25 and MLVA31. a congruence 
analysis in the "France" dataset (this study); b congruence analysis in 
the "Namibia" dataset [16]. 
doi:1 0.1 371 /journal.pone.00951 31 .g006 

MLVA31 the MLVA7 scheme appears as a good ahernative for 
typing B. anthracis whenever capillary electrophoresis technology is 
not available or a quick genotype analysis must be performed. The 
proposed MLVA7 panel correctly identifies the main features of 
the population structure of the two distinct populations investi- 
gated here. For higher discriminatory power, two additional 
panels could be used, each one fitting in one multiplex PGR: the 
"exosporium loci" panel including bamslS, bamsl5, bamsSO and 
bamsSl and the "plasmid loci" panel including pXOI, pX02, 
vntrlG, vntrl7. 

MLVA31 data generated in this study were deposited in the 
"Bacillus anthracis" database which can be accessed at http://mlva. 
u-psud.fr/. This is an open and collaborative database on B. 
anthracis that has been created to share all avaitable MLVA (and 
SNP) typing data in a unique database for comparison testing and 
epidemiological studies. A VNTR and SNPs search tool is also 
provided to determine in silico the genotype of any strain based on 
its whole genome sequence. 

In conclusion, the worl? undertaken in the present study updates 
the genetic landscape of B. anthracis diversity in France from 
previous publications [17,40] and provides extended datasets 
about autochthonous strains that can be used for future 
epidemiological, epizootiological or preliminary forensic studies 
so that hypotheses can be made about strains origin. We expect 
that we have achieved now a correct coverage of the genetic 
diversity of B. anthracis naturally occurring in France and that the 
small assay which has been devised using this data can be 
confidendy used as a routine, first line assay for typing new French 
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